dbscan clustering algorithm
DBScan Clustering Algorithm
Clustering is an important topic in busyness, because it helps us to reduce the number of features to some typology, to some clusters which, in a case that data allows us, can give us more informations about our topic of interest. In a data science literature it is usually presented as dimension reduction technique, but in science, or even in data science it could reveal some additional pattern in data that is not obvious at the first glance. Imagine you have some features about some students: their marks, their personality traits, their ability scores, their motivation. Clustering could reveal you the completely new types of (un)successful students (it could be someone with high ability and low motivation -- underachiever, but at the same time it could be someone with high motivation and really good marks, but low abilities -- overachiever). This could simply done by clustering, while our cluster names (overachiever, underachiever) are basically interpretations of the clusters.
DBSCAN Clustering Algorithm in Machine Learning - KDnuggets
In 2014, the DBSCAN algorithm was awarded the test of time award (an award given to algorithms which have received substantial attention in theory and practice) at the leading data mining conference, ACM SIGKDD. Clustering analysis is an unsupervised learning method that separates the data points into several specific bunches or groups, such that the data points in the same groups have similar properties and data points in different groups have different properties in some sense. It comprises of many different methods based on different distance measures. Centrally, all clustering methods use the same approach i.e. first we calculate similarities and then we use it to cluster the data points into groups or batches. Here we will focus on the Density-based spatial clustering of applications with noise (DBSCAN) clustering method. If you are unfamiliar with the clustering algorithms, I advise you to read the Introduction to Image Segmentation with K-Means clustering.